-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add list_file() functional API to FSSpecFileLister and IoPathFileLister #463
Conversation
Hi @xiurobert! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
Additional notes: Perhaps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the functional API refers to functional_datapipe
. See ref:
data/torchdata/datapipes/iter/load/fsspec.py
Line 104 in ec83d11
@functional_datapipe("open_file_by_fsspec") |
By adding this decorator the class, we are able invoke such class using functional call.
@functional_datapipe("list_file_by_fsspec")
class FSSpecFileListerIterDataPipe(IterDataPipe[str]):
...
dp = IterableWrapper(["file://folder", ])
dp = dp.list_file_by_fsspec() # Functional API here
list(dp) # return list of files in folder
Sorry, my bad. Must have misunderstood. Will make the changes |
@ejguan updated it along with relevant tests |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
All tests are passing on my machine so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much. Overall, LGTM with two minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Question: Do you plan on working on pytorch/pytorch#78263 as well? I see an issue but not a PR for PyTorch Core.
Yep, I do intend to do so. I opened the issue because the CONTRIBUTING.md file specifies I should be opening one before working on any "features" |
@NivekT is it ok if I directly open the pull request with PyTorch core? Their contributing guidelines mention that I should be opening an issue before implementing a feature. |
Just saw the PyTorch FAQ, I could probably open the PR since it's a small change. |
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, LGTM
Will the APIs in torchdata be updated with the new grammar as well? |
Awesome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few bugs in your tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add tests for list_files_by_iopath
in
Lines 655 to 673 in aab67f3
@skipIfNoIoPath | |
def test_io_path_file_lister_iterdatapipe(self): | |
datapipe = IoPathFileLister(root=self.temp_sub_dir.name) | |
# check all file paths within sub_folder are listed | |
for path in datapipe: | |
self.assertTrue(path in self.temp_sub_files) | |
@skipIfNoIoPath | |
def test_io_path_file_lister_iterdatapipe_with_list(self): | |
datapipe = IoPathFileLister(root=[self.temp_sub_dir.name, self.temp_sub_dir_2.name]) | |
file_lister = list(datapipe) | |
file_lister.sort() | |
all_temp_files = list(self.temp_sub_files + self.temp_sub_files_2) | |
all_temp_files.sort() | |
# check all file paths within sub_folder are listed | |
self.assertEqual(file_lister, all_temp_files) |
datapipe = IterableWrapper(["file://" + self.temp_sub_dir.name, "file://" + self.temp_sub_dir_2.name]) | ||
datapipe = datapipe.list_files_by_fsspec() | ||
res = list(datapipe).sort() | ||
self.assertEqual(res, temp_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the test is still failing here. Could you please fix it by mimicking the test case from line 80 to line 89?
I believe they are here https://github.com/pytorch/data/pull/463/files#diff-6e69ca11dfe73793a94592ec3e4a303e6807afa8a7fed4d88168b50d9be829e3R663-R667 |
You can run your test on your local machine via |
Changes made @ejguan |
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, LGTM
Co-authored-by: Kevin Tse <NivekT@users.noreply.github.com>
Co-authored-by: Kevin Tse <NivekT@users.noreply.github.com>
Co-authored-by: Erjia Guan <68879799+ejguan@users.noreply.github.com>
I completely forgot that sort() was in place and returns None, oops
Did this work? I squashed the commits locally and force-pushed. |
0726935
to
ff13537
Compare
It works now. Let me import your PR. Don't worry about squashing the commits. We will do it automatically. |
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Great. |
…er (pytorch#463) Summary: Fixes pytorch#387 ### Changes - Adds `list_file()` method on `IoPathFileListerIterDataPipe` - Adds `list_file()` method on `FSSpecFileListerIterDataPipe` - Add tests for those methods #### Additional comments I feel as if the implementation is quite naive. Would appreciate any feedback on it. Pull Request resolved: pytorch#463 Reviewed By: NivekT Differential Revision: D36777142 Pulled By: ejguan fbshipit-source-id: 1c4474776f3fcd377ae545bd8bd7bf26d0b2fa88
…er (#463) Summary: Fixes #387 ### Changes - Adds `list_file()` method on `IoPathFileListerIterDataPipe` - Adds `list_file()` method on `FSSpecFileListerIterDataPipe` - Add tests for those methods #### Additional comments I feel as if the implementation is quite naive. Would appreciate any feedback on it. Pull Request resolved: #463 Reviewed By: NivekT Differential Revision: D36777142 Pulled By: ejguan fbshipit-source-id: 1c4474776f3fcd377ae545bd8bd7bf26d0b2fa88
Fixes #387
Changes
list_file()
method onIoPathFileListerIterDataPipe
list_file()
method onFSSpecFileListerIterDataPipe
Additional comments
I feel as if the implementation is quite naive. Would appreciate any feedback on it.